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The asymptotic normality of U-statistics has so far been proved for iid data 
p ^ I and under various mixing conditions such as absolute regularity, but not for 

strong mixing. We use a coupling technique introduced in 1983 by Bradley 
[6] to prove a new generalized covariance inequality similar to Yoshihara's 
|27| . It follows from the Hoeffding-decomposition and this inequality that 
U-statistics of strongly mixing observations converge to a normal limit if the 
kernel of the U-statistic fulfills some moment and continuity conditions. 
The validity of the bootstrap for U-statistics has until now only been es- 



> 



C>o I tablished in the case of iid data (see Bickel and Freedman [4]). For mixing 

OO . data, Politis and Romano [23j proposed the circular block bootstrap, which 

^ leads to a consistent estimation of the sample mean's distribution. We extend 

these results to U-statistics of weakly dependent data and prove a CLT for 
the circular block bootstrap version of U-statistics under absolute regularity 
OO , and strong mixing. We also calculate a rate of convergence for the bootstrap 

^p. ' variance estimator of a U-statistic and give some simulation results. 

> ' 

>< : 1 U-Statistic CLT 

U-statistics are a broad class of nonlinear functionals, including many well-known exam- 
ples such as the variance estimator or the Cramer-von Mises-statistic. For simplicity of 
notation, we concentrate on the case of bivariate U-statistics. 

Definition 1.1. A U-statistic with a symmetric and measurable kernel /i : — > R «s 
defined as 

Un{h) = —^— Yl hiX„X,). 
n(n — I) ^-^ 

i<*<i<" 
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1 U-Statistic CLT 



Un (h) is the uniformly minimum variance estimator of6 = E[h (Xi , X2. )] , if Xi , . . . ,X. 
are iid with an arbitrary absolutely continuous distribution. To prove asymptotic nor- 
mality of U-statistics, Hoeffding [15] decomposed Un (h) as follows: 



n 



Unih) = e + -y2him + —^—- V h2{x,,Xj) 

n ^ n(n — l).,'^, 

with 



n (n — 1) 

i=l ^ ' l<i<j<n 



hi{x) := Eh{x,X2)-0 
h2{x,y) := h{x,y) - hi{x) - hi{y) - 9. 

The linear part Y17=i ^1 i-^i) ^ random variables with a normal limit 

distribution, Yli<i<j<n ^2 i^i^^j) = C^n (^2) is called the degenerate part of 

the U-statistic and converges to zero in probability, as its parts are uncorrelated, so the 
U-statistic is asymptotically normal. 

Under dependence, the summands of the degenerate part can be correlated and this 
can change the limit distribution. Under the strong assumption of ^-mixing. Sen [2^ 
showed that U-statistics are asymptotically normal. Yoshihara assumed Xi, . . . , Xn to 
be stationary and absolutely regular and proved a CLT for U-Statistics under this weaker 
condition (for a detailed description of the various mixing conditions see Doukhan [TT] 
and Bradley [7]). 

Definition 1.2. A sequence {Xn)neN random variables is called absolutely regular, if 

13 (m) := sup {/? ((Xi, ...,Xk), (X,) J \ken] 
where (3 is the absolute regularity coefficient defined as 



P{Y,Z) ■.= E 



sup \P[A\Z]- P[A]\ 

A(ia(Y) 



Yoshihara has proved the asymptotic normality of the U-statistic C/„ {h) using a gen- 
eralized covariance inequality: With increasing distance between the indices ii,i2!*3)^4, 
the covariance of /12 [Xi^^Xi^) and /12 {Xi^^Xi^) becomes smaller and therefore the de- 
generate part vanishes as in the independent case. 

Denker and Keller [1^ have weakened the mixing assumption to functionals of abso- 
lutely regular processes, Borovkova, Burton and Dehling [5J showed convergence of the 
empirical U-process to a Gaussian process. Recently, Hsing and Wu [IB] proved a CLT 
for weighted U-statistics of processes that have the form X„ = F (..., 6^-21 en-i, Cn), 
where (e„)^g2 is an i.i.d. process. 

We want to extend Yoshihara's CLT to random variables, which fullfill the strong 
mixing condition: 
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1 U-Statistic CLT 



Definition 1.3. A sequence {X^} 



neN 



of random variables is called strong mixing if 



a 



(m) = sup {a ((Xi, ...,Xk), (X,)^>,+^) \k G n} 



m- 



oo 



where a is the strong mixing coefficient defined as 



a {Y, Z) 



sup \P{Ar\B)- P{A)P{B)\. 



A(ia(Y) 



Strong mixing is weaker than absolute regularity, but absolute regularity and strong 
mixing are equivalent for random variables, which take their values in a finite set. One can 
approximate general random variables by such discrete ones. To make this discretization 
work for U-Statistics, we impose a continuity condition on the kernel, that is not needed 
in the case of absolutely regular data: 

Definition 1.4. Let {Xn)^^^ he a stationary process. A kernel h is called V-Lipschitz- 
continuous if there is a constant L > with 



for every e > 0, every pair X and Y with the common distribution Vxi,Xk for a k G N 
or Vxi X Pxi and X' and Y also with one of these common distributions. 

P-Lipschitz-continuity is a special case of p-continuity established by Borovkova, Bur- 
ton and Dehling [5]. It is clear that every Lipschitz-continuous kernel is P-Lipschitz- 
continuous. But this definition holds also for many kernels that are not Lipschitz- 
continuous in the ordinary sense: 

Example 1.5 (Variance estimation). Consider stationary random variables with bounded 
variance and the kernel h{x,y) = ^ {x — y)^ . The related U-statistic is the well known 
variance estimator 



E[\h{X,Y) - h{X' ,Y)\l{\x-X'\<e 



:}] < Le 



1 

Un{h) = -—Y.{X,-Xf. 



i=l 

For random variables X, X' and Y as above, we get: 




[\X - X'\ \X + X'- 2Y\ t{\x^x'\<e}\ 



< -eE [\X + X' -2Y\] < 2eE \X 



This proves the 7^-Lipschitz-continuity of h. 
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1 U-Statistic CLT 



Example 1.6 (Dimension estimation). Let t > 0. The kernel h{x,y) = l^\x-y\<t} 
is related to the Grassberger-Procaccia dimension estimator pjj. It is P-Lipschitz- 
continuous, if there is an L > 0, such that for all e > and every common distribution 
of X and Y from Definition ll.4[ 

P[t-e<\X -Y\<t + e]< Le 

The difference between l{|x-y|<t} and l^\x'-Y\<t} is not 0, iff |X — y| < t and \X' — Y\> 
t or the other way round. As \X — Y'\ < e, it follows that t — e < \X — Y\ < t + e. There- 
fore 

E [\l{\x~Y\<t} - l{|X'-y|<t} I l{|X-X'|<e}] <P[t-e<\X-Y\<t + e]<Le. 

Example 1.7 (P-Lipschitz-discontinuity). Consider the kernel h {x, y) = l|^.>o}.-|-l|^>o| 

and let the Xj have the density f (t) = g |t| ^ l[-i,i]\{o} (0- Then for independent 
random variables X, X' and Y with density / 



E [\h{X,Y) -h{X',Y)\t 



{\X-X'\<e}_ 



> P 



X G 



"•1. 



P 



X'€ 



_ 4 2 
4 363. 



So this kernel h is not 7^-Lipschitz-continuous, because the probability distribution is 
concentrated in the neighborhood of the jump of h. 

It becomes clear from the examples that it depends not only on the kernel h, but also on 
the distribution V, whether the kernel h is T-'-Lipschitz-continuous. We extend the CLT 
for U-statistics to strongly mixing data using the Hoeffding-decomposition and a new 
generalized covariance inequality. The strong mixing assumption is weaker than absolute 
regularity (as in Yoshihara's CLT), but this comes with the price of more technical 
conditions: A faster decay of mixing coefficients, some finite moments of Xi and the 
additional 7^-Lipschitz-continuity of the kernel. 

Theorem 1.8. Let (Ar„)„gj^ he a stationary, mixing process and h a kernel, such that 
for a 6 > 0, M > 0; 



// 



VA; G No : 

// one of the following two conditions holds 
for a6' e (0, 6): (3 (n) = O { 



h{xi,X2)f^^ dF{xi)dF{x2) < M 
/i(xi,xi+A,.)|^^^(iP(xi,xi+fc) < M 



h is V- Lip schitz- continuous, E\Xi\'^ < oo for a 7 > and for p > ^7^+^+57+2 



a{n)=0 (n-P) 



2j5 
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2 Bootstrap for U-statistics 



then 



V^iUnih)-e)^N{0,4al) 
with = Var [h^ {X^)] + 2 ^^^^ Gov [h (Xi) hi (Xi+fe)]. 

2 Bootstrap for U-statistics 



There is a variety of block bootstrap methods (see Lahiri |20|), we consider the circular 
block bootstrap introduced by Politis and Romano [23]. Instead of the original sample 
of n observations with an unknown distribution, construct new samples X^, . . . ,X^i as 
follows: Extend the sample periodically by XiJ^n = Xi, choose blocks of 

I = In consecutive observations of the sample randomly and repeat that h = [ jj times 
independently: For j = 1, . . . , n, fc = 0, . . . , 6 — 1 

1 



X^ 



kl+l 



X 



J' ■ 



(k+l)l 



X 



j+l-l 



n 



where P* is the bootstrap distribution conditionally on (^n)„gN, 
conditional expectation and variance. Note that E* [X*] = ^ Yli^=i 



and Var* are the 
= X. For strong 

mixing stationary processes, Shao and Yu (25] proved that the bootstrap version of the 
sample mean X* = ^ Yli=i -^t has almost surely the same asymptotic distribution as 
the sample mean X and that the variance of X* and of X converge to the same limit. 



With increasing block length I, the bias of the bootstrap variance estimator Var* 



VblX* 



becomes smaller and the variance becomes bigger. By minimizing the mean squared error 



(MSB) of Var* 



Vbix* 



, one gets the following rate of convergence (see Lahiri [19] ) 



mm 

I 



MSE Var 



Vbix* 



On 



Naik-Nimbalkar and Rajarshi [21] have shown that the consistency of the block boot- 
strap holds also for the empirical process. Furthermore, the block bootstrap is valid for 
smooth functions of means and differentiable functional of the empirical process (e.g. 
L-statistics), as well as for M-estimators; see the book of Lahiri [20], chapter 4. 

The bootstrap for U-statistics has so far only been studied in the independent case, 
beginning with Bickle and Freedman [1], and extended to degenerate U-statistics by 
Arcones, Gine [J] and Dehling, Mikosch [9], to studentized U-statistics by Helmers [H] 
and to weighted bootstrap by Janssen [TSJ. 

To bootstrap U-statistics from times series, one can resample blocks of observations 
and plug them in: 



K{h) 



hi {bl - i; 



l<i<j<bl 



+ 



hi 



+ 



bl {hi - 1; 



l<i<j<bl 



X*,X*) 
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2 Bootstrap for U-statistics 



We show that for strongly mixing data the circular block bootstrap version of a U- 
statistic has the same asymptotic variance and the same normal limit distribution as the 
U-statistic itself. 

Theorem 2.1. Let be a stationary, mixing process and h a kernel, such that 

for a6>0, M > 0; 

J J \h{xi,X2)\^^^dF {xi)dF{x2)<M 



Let I be the block length with I 
following two conditions holds 



for aS' e {0,S): P{n) = 0{n 



oo and I = O (n^ for some e > 0. // one of the 



2+S' 



373+<5+57+2 



h is V -Lipschitz-continuous, E\Xiy < oo for a 7 > and for p > ^ 2-^5 
a{n)=0 {n-P) 



then 



sup 

a;eR 



Var* \\/hlUl {h) \ - Var [V^Un {h)] 
Vbi {U* (h) - E* [U*]) <x]-P[V^ {Un (h) -e)<x] 



V 



0. 



(2) 
(3) 



If we assume the existence of higher moments, we can achieve almost sure convergence: 

Theorem 2.2. Let (Xn)„gN be a stationary and absolutely regular process and h a kernel, 
such that for a 6 > 0, M > 0: 



J J \h{xi, X2) 1^+"^ dF {xi )dF{x2)<M 
G No : j\h (xi, dP (xi, xi+fc) < M 



3(4+,50 



and for a 6' £ (0, 6) j3 (n) = O yn s' j and additionally I "^°°) 00 and I = O {v} 
for some e > 0, then 



sup 



P" 



Var* \\/hlU^ {h) \ - Var [v^C/„ {h)] 
Vbl ([/* (h) - E* [U*]) < xl - P [v^ {Un {h) -9)<x] 



0. 



(4) 
(5) 



The degenerate part of the bootstrapped U-statistic converges to zero with a rate, 
which does not depend on the blocli length and is faster than the convergence of the sam- 
ple mean. Choosing the optimal block length for the block bootstrap variance estimator 
of the linear part Y17=i ^1 i-^i)' achieve the following rate of convergence: 
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2 Bootstrap for U-statistics 



Corollary 2.3. Let {Xn)n£fq be a stationary and absolutely regular process and h a 
kernel, such that for a 5 > Q, M > {): 

\h{xi,X2)f^^ dF {xi)dF {x2) < M 
V/cGNo: / \h{xi,xi+kt+l^ dP {xi,xi+k) < M 



3(6+i5') ' 

and for a 6' £ (0, 6) (3 (n) = O [n \ , the variance estimator converges with the 



following rate: 



min MSE (^Var* 



VhlK{h)\)=0{n~i) (6) 



Remark 2.4. If 

Var [hi (Xi)] + 2 ^ Gov [hi {Xi) , hi (X^)] > 

k>2 

Y,kCov[hi {Xi),hi /O, 



k>l 



has the form 



then the optimal block length Z° = argmin ^MSE ^Var* y/UU* {h) 

P = Kn~3 + o ^n~3^ for a constant K (see Corollary 3.1 of Lahiri [l9]). To find 

this block length l^, one can use the following subsampling method introduced by Hall, 
Horowitz and Jing |13| : 

Choose a pilot block sitzen /* and a subsampling size m = nin such that +mn~^ — > 
and minimize 

^ n—m+l 

It lit ~\ -L \ 

k=l 

where Var* is the bootstrap variance if the block length is I and 

m[m — 1) ^-^ ■' 

k<i<j<k+m-l 

is the bootstrapped U-statistic of the m observations starting with X^. Choose a small 
e > and set ^ 

f — )' argmin i , i f MSE 

as the estimated optimal block length. The consistency of this subsampling method has 
been proved by Nordman, Lahiri and Fridley [22j for the sample mean. 
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2 Bootstrap for U-statistics 



Remark 2.5. Theorem 12.11 and Corollary 12.31 hold not only for the circular block boot- 
strap, but also for the moving block and the non overlapping block bootstrap. For a 
proof, note first that there are results analogous to the theorem of Shao, Yu [25] (see the 
book of Lahiri |20j and the references therein) for these bootstrapping methods. Theo- 
rem 3.3 of Lahiri |19j treats all three methods. Moreover, the bounds for the bootstrap 
version of the degenerate part [/„ (^2) (Lemma 13.71 and [ 3.8p remain valid. 



Simulation results: We study the estimator for the variance o"^ = Var [Xi], which can 
be expressed as a U-statistic (see Example II. 5p 



a' 



n 



1 " 

i=l 



n 



n — 1 



and the stationary autoregressive process defined by Xn = ^Xn-i +e„, where (e„)„g|^ is 
a sequence of iid standard normal random variables. The distance between the real and 
the bootstrapped distribution function 



Dboot = sup 
is compared to 



sup 

x6K 



X 



nVar[(T2]^ 



p [V^ 



a 



where <I> is the distribution function of a standard normal random variable. The covari- 
ance matrix of ^X,X^^ is estimated using the moment method, including the autoco- 
variances for lags not bigger than I. Applying the J-method, one obtains: 



Var[CT2 



( 

\\i-j\<l 



xt 



X^ (X| -X2 



AX (X^-XA{X,-X)+AX^ Yl i^^-X){X,-X) 



\i-j\<l 



\i-j\<l 



J 



We have calculated the distances Dhoot and Dnorm with the empirical distribution func- 
tion of 10,000 random variables. 

The following table shows the mean of 1,000 realizations of -D;,oot and Dnorm for differ- 
ent sample sizes n and block lengths I, where the block lengths are integer approximations 
to ns. In all cases, the moving block bootstrap performs better than the normal approx- 
imation: 



8 



3 Auxiliary Results 



sample size n 


block length I 


bootstrap 


normal approx. 


24 


3 


0.153 


0.196 


48 


4 


0.111 


0.125 


100 


5 


0.076 


0.091 


200 


6 


0.060 


0.073 


500 


8 


0.039 


0.046 



The boxplots below give a closer look at the distributions of Di,oot and Dnorm- The 
bootstrap version Dboot has not only the lower median, but produces far less outliers 
than the normal approximation. 



0.4 



0.3 



0.2 



0.1 



0.0 




"1 r 

bootstrap normal 

n=100 



1 r 

bootstrap normal 
n=200 



3 Auxiliary Results 

3.1 Generalized Covariance Inequalities 

Yoshihara has proved the asymptotic normality of the U-statistic C/„ (h) with the help 
of the Hoeffding-decomposition and the following generalized covariance inequality: 

Lemma 3.1 (Yoshihara [27]). // there are 6,M > 0, so that for all /c G No 

J J \hixi,X2)\^^^ dF {xi)dF{x2)<M 

j \h{xi,Xk)\^+^ dP{x^,Xk) < M 

then there is a constant K, such that for m = max{i(2) ~ i{i),i(A) ~ ^3)1' where < 
i{2) — ^(3) — ^(4) following inequality holds: 

\E[h2{Xi„X^,)h2{X^,,X,,)]\ <Kp^{m) (7) 
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3 Auxiliary Results 



To prove Lemma 13.11 under absolute regularity, one can use coupling techniques (see 
Berbee pj and Berkes, Philipp [3j): For dependent random variables X and Y, one can 
find a random variable X' , such that 

• X' has the same distribution as X, 

• X' and Y are independent, 

. P[X^X']=p{X,Y). 

Such a coupling is impossible under strong mixing, as can be seen e.g. from the results 
of Dehling [8]. Bradley [6], however, was able to establish a weaker type of coupling 
for strong mixing random variables, using the fact that absolute regularity and strongly 
mixing are equivalent for random variables taking their values in a finite set and approx- 
imating general random variables by such discrete ones: 

Lemma 3.2 (Bradley [6j). Let X, Y he random variables, X real-valued with E\X\^' < 
oo. Let < e < Then there exists (after replacing the underlying probability space 

by a bigger one if necessary) a random variable X' such that 

• X' has the same distribution as X , 

• X' and Y are independent, 
• 

||V||2+^ 27 

P[\X - X'\>€] <18 " VI a^{X,Y). (8) 

g2+7 

As this coupling under strong mixing allows small differences between X and X' (while 
X and X' are equal with high probability in the case of absolute regularity), we need 
the T'-Lipschitz-continuity of the kernel. 

Lemma 3.3. Let h be a V -Lipschitz-continuous kernel with constant L, (Xn)^^^ a sta- 
tionary sequence of random variables. If there is a ^ > with E \Xk[^ < oo and M > 0, 
6 > 0, so that for all k eNq 

j j \h{xi,X2)\^^^ dF {xi)dF {X2) < M 
j |/i(xi,Xfc)|2+^dP(xi,Xfe) < M 

then there exists a constant K = K ^7, ||^ , 5, M, , such that the following inequality 
holds with m = max {z(2) — — i(3) } • 

27,5 

\E[h2{Xi„Xi,)h2{Xi„Xi,)]\<KamTE^{ni) (9) 
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3 Auxiliary Results 



3.2 Bounds for the Degenerate Part of a U-Statistic 

With the covariance inequahties one can show that the covariance of the summands 
h {Xi,Xj) is small if the gap between the indices is big enough. Therefore, the degenerate 
part decreases fast enough, so that it does not disturb the asymptotic normality of 

Lemma 3.4 (Yoshihara [23)- If the assumptions of Lemma \3. 1\ hold and furthermore 
for a 6' < 6 

/3(n) = Oin-^ 
then for Un (^2)' 

E[nU^{h2)]<—^—^ E \E[h2iX,„Xi,)h2{X,^,X,,)]\ 

^ ' l<«i<22<n l<«3<«4<n 

A 

<— \E{h2[Xi^.X.,,)h2{X,,,,X,,)\\=0{n~^) (10) 



n 



with rj = min|2^,^2T^,l|. 



So y/nUn (/12) vanishes as n increases. For one of our later results, we also need another 
one of Yoshihara's lemmas (Our assumptions and result differ slightly from the lemma 
in [27], as we believe there is a misprint): 



Lemma 3.5 (Yoshihara {27]). // 



\h{xi,X2)\'^'^^ dF {xi)dF {x2) < M 



VfcGNo: J \h{xuXi+kt^^dP{xi,xi+k)<M 
and for a 6' € (0,6) (5 {n) = O ^n"^^^^, then for r]' = min |l2^,^=|^, l| 



16 " 

< — E I ^ {Xi^ , Xi^ ) /i2 (Xj3 , Xi^ ) h2 {Xi^ , ) /i2 {Xi^ , Xi. 

«1,...,«8 = 1 

= 0(n-^-'^']. (11) 
Now we show a result analogous to the Lemma 13.41 under strong mixing: 

(n) = O {n~P) 



Lemma 3.6. // the assumptions of Lemma W^ hold and for a p > ^7^+^+57+2 



a I 
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3 Auxiliary Results 



then for Un (/12) 



E[nU^ih2)] < \ , Yl E \E[h2iX,„X,,)h2iX,„X,,] 
n(n—l) ^^~T, ^^^^,^z ^.^ 



l<«i<j2<ri I<i3<i4<n 



Yl \E[h2iXi„Xi,)h2{X,„Xi,)]\=0{n-'^) (12) 



with r] = min | p 



27<5 



375+5+57+2 > ^- 



-} 



We need a bound for C/^ (/12) = hijbl-i) ^i<i<j<n ^2 (^i'^ j • Using Yoshihara's 
inequality for the second moment respectively Lemma 13.61 and using the fact that the 
bootstrap expectation of a U-statistic is similar to a von Mises-statistic, we get: 

Lemma 3.7. Let (-'^^n)„gN stationary, mixing process and h a kernel, such that for 
a 6 > 0, M > 0: 

jj \h{xi,X2)\^^^dF{xi)dF{x2)<M 
Vfc G No : j |/i(xi,xi+fc)|2+^dP(xi,xi+fc) < M 
If one of the following two conditions holds 
• for a6' £ (0,6): /3 (n) = o(n'^ 



• h is V- Lip schitz- continuous, E\Xif' < 00 for a 7 > and for p > ^^^^2^5^^^ ■ 



a{n)=0 (n-P) 
then for rj = min 1 2 g/'(2+5) ' respectively 



275 



^ = mill 1^ 375+5+57+2 
E[E* [hlU*^{h2)\] =0(n"''). 



1,1 



(13) 



With the inequality for the fourth moment, we can calculate a faster rate of conver- 
gence. Note that this rate does not depend on the block length. 

Lemma 3.8. // 

jj \h{xi,X2)\^^^dF{xi)dF{x2)<M 
VA; G No : j |/i (xi, xi+fc)|^+^ dP (xi, xi+fc) < M 

/ 3(4+5') \ f 

and for a 6' e (0,6) (3 (n) = O In \, then for rj' = min 1 12^,^^, l} 

J^[{blfU^^ih2)]] =o[n-^-^'). (14) 
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4 Proofs 



4 Proofs 

We will first prove the auxiliary results and after that the CLT and the theorems about 
the bootstrap. 

4.1 Auxiliary Results 

Proof of Lemma \3.3l - For simplicity, we consider only the case ii < 12 < h < H and 
i2 — h ^ i4 — h- Let e > 0, K > and define: 



/i2 is P-Lipschitz-continuous with constant 2L, as for all X, X' , Y as in defintion 11.41 
and Y' with the same distribution as Y and independent of X and X': 



E [\h2 {X, Y) - h2 {X', Y) I l{lX-X'\<e}] 

<E [\h{X, Y)-h {X\ Y) I l{|x-x'|<.}] +E[\hi {X) - hi {X') \ l{|x-X'|<e}] 
<E [\h{X, Y)-h {X\ Y) I l{|x-X'|<.}] 

+ E[\h (X, Y') - h (X', Y') I l{|x-X'|<.}] < 2ie- 



Obviously, h2,K is "P-Lipschitz-continuous with the same constant 2L as /12. With 
Lemma 13.21 choose a random variable X'- independent of X-i^ , Xjg , Xi^ with 




P[\X,,-Xl\>e] <18 




As /i2 is a degenerate kernel, we have 



E[h2 {X[^,X,,)h2{Xi,,Xi,)\ =0. 



Therefore, we get: 



E [h2 {Xi^^Xi^) /i2 (Xjg, Xj^)]! 

E [h2 [Xi, , Xi, ) h2 {Xi, ,Xi,)]-E [h2 {X[^ , Xi, ) h2 {Xi, , Xi, )] I 
E[[h2{X,,,X,,)-h2 [X[^ , X,, ) ) h2 {X,, , X, J] I 





} 



+ E [\h2^K {Xi-^,Xi^) /i2,A- {Xi^,Xi^) — /i2 {Xi^,Xi^) h2 (XjgjXj^)!] 
+ E [|/i2,ii- (X-^jXjj) /i2,A' (-''^jaj-'^M) ~ ^2 (-'^jv-'^j2) ^2 (-^^js'-'^m)!] 
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4 Proofs 



Because of the T^-Lipschitz-continuity and |/i2,_ft" (-'^3,-'^4)| < vK, the first summand 
is smaller than 2Le^/K. In consequence of Lemma 13.21 the second term is bounded by 



P[\X,,-Xl^\>e] 2K<36 



ll^lir ^ 



£2+7 



a 2+7 (m) K. 



For the third summand, we get: 

E [\h2,K {Xi-^,Xi2) h2,K {Xi.^,Xi^) — /l2 (Xj^jXjj) /i2 {Xi^,Xi^) 



< E 



)\h2 (X 



«3 ' -^'U ) 



+ E 



+ E 



\h2 iX,„X,,)\(\h2 (X,3 



\h2 {X,„X,,)\-VK) (\h2 (Xi3,X 



^{\h2{X,-^,X,2)\>VK,\h2 (X,3 J I < v^} 



^ { I /12 ) I < v^, I /12 (X,3 J I > v^} 



1 



{\h2{X,^,X,^)\>VK,\ h2 (X,3 J I > Vi?} 



< E 



\h2 iXi„Xi,)\-^/K) Vki 



^{\h2{X,^,X,.^)\>VK} 



E 



\h2 (X,3,X,J| -/^) VKI 



^{|/^2(X,3,X,,)|>^/K} 



|/i2(Xi,,XiJ| - l{|/»2(X,,,X,2)|>v^} 
( I /i2 (Xi3 , X, J I - ^/k) ' 1 II (^^^ ^^^^ ) I 



- 2 



^2 (Xil ' ) 1 { I /,2 (X,, ) I > V^} 



^2 (Xi3 , ) 1{ I fe2 (X,3 ) I > V^} 



1 E\h2 [Xj-^ , 

— o s 



|2+(5 



+ 2 



l^|/i2(X,3,X,J|2+'^ . M 



s 

K2 



< 



s 

K2 



After treating the fourth summand in the same way, we totally get: 

\E[h2 {X,,,X,,)h2 (X,3,X,J]| 



27+1 



< 2LeVK + 36 



27 T+^ 



ll^ll 



7 
2+7 

7 



27 



M 



^ a2+7 (m) K + 2- g 

£2+7 K2 



Setting e" = ||Xi||^^+ L 3-1+^ a {171)31+1 K'ii+i , we obtain: 

7 7 §7 + 1 27 M 

f{e^,K) = 38||Xi||^^+' L^K^^a^ (m) + 2— 

K2 
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2-1 



27 



47 



67+2 



With = \\Xi\\^ 37«+i+57+2 ^ 3^s+s+5-f+2 a (m) 375+5+57+2 M 3-fS+s+5-,+2 ^ ^6 gct thc boundi 
\E [h2 {Xi^^Xi^) /i2 {Xi,^,Xi^)]\ 



< f (e°,K°) = 40 II Xi 11^^'*+''+^^+^ 375+5+57+2 M 3^''+'5+57+2 Q, (m) 375+5+57+ 



57+2 



275 



□ 



Proof of Lemma 13.61 - The proof is exactly the same as of Yoshihara's Lemma 13.41 using 
Lemma [3?3] instead of 13.11 Therefore, we concentrate on the case ii < ^2 < ^3 < H and 
12 — h > ^4: — is- If i2 — h = m, there are at most n possibihties for ii and and m 
possibihties for 14: 



Eo ^ ^ 275 
\E[h2 {X,„Xi,)h2 (X,3, J] I <n2 ^mi^a(m) OT+5W2 



«1<*2<«3<«4 
«2— *1>*4 — *3 



m=l 



< K2r? ^ P375+5+W2 = o [n^-'i) 



m=l 



With a similar argument for the other cases, we get 



\E [h2 iX,„Xi,) h2 iXi,,X,,)]\ = O (n3-^) . 

«1,*2,«3,*4=1 



□ 



Proof of Lemma \3.1[ The bootstrapped expectation of /12 (X* ,X*) /i2 (X* ,X*) (con- 
ditionally on (X„)„gpj) depends on the way the indices ii,i2,i3,H are allocated to the 
different blocks. First consider indices 21,^2, ^3,^4 lying in different blocks (therefore, 
X* , . . . ,X* are independent for fixed (X„)^gpj). Then the bootstrapped expectation of 
/i2 (Xj* ,X*j /i2 (^iv^j*4) is a von Mises-statistic and we get 

\E[E* [h2 {Xl,Xl)h2 {Xt,,Xl)]]\ 



E 



^ ^2 (Xj^ , Xj2 ) /l2 (Xjg , Xj 

«1,«2,«3>«4 = 1 
1 " 

<^ l-E' [^2 (Xj^,Xi2) /i2 (XjgjXj^) 



il,«2,«3,M = l 



There are at most n'^ possibilities for the four indices to be in four different blocks, so 
Y: \E[E^[h2{Xl,Xl)h2{Xl,XtJ]]\ 



«1,*2,*3,«4 
4 diff. blocks 



< ^2 1^ (Xii , Xj2 ) /l2 (Xj3 , Xj, 

«l,i2,«3,«4 = l 
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As an example, let ii and ?2 now lie in the same block (write ii ~ 12) with i2 — H = k, 
while «3, ii lie in two further blocks. The bootstrapped expectation is no longer a von 
Mises-statistic, as Xf_^ and X* are dependent. To repair this, add up the expected 
values for all 12 in the same block as ii and take into account that there are at most 
possibilities for 11,13,1^: 



\E[E'' [h2 {Xl,Xl)h2 {Xt„Xl)]]\ 
1 " 

< — 3 ^ \E [h2 {Xi^, Xij^^k) h2 {Xi^, Xi 

il,ii,i4 = l 

^ \E[E^ [h2{Xl,Xl)h2{Xl,Xl)]]\ 
1 

<— ^ \E[h2{Xi^,Xi^)h2{Xi.^,Xi^ 
^ \E[E*[h2{Xl,Xl)h2{Xl,Xl)]]\ 



11,12,«3,«4 



< J] |i?[/i2(Xi,,Xi,)/i2(X,3,X,J]| 
*l>i2,«3,*4 = l 

When the indices are allocated to the blocks in another way, analogous arguments can 
be used. Totally, we get by Lemma [?!4l or \3M keeping in mind that ^ ^ 1: 



E 



E* 



2-\ 



4 

^ ' «1,«2,«3,*4 = 1 



< 



K 



hi {hi 



-2 \E[h2{Xi„ Xi, ) h2 {Xi, ,Xi,)]\=0{n 



«li*2,i3,*4 = l 



□ 



Proof of Lemma fX^ ' We use similar arguments as above. If ii, . . . , is are in 8 different 
blocks, then the bootstrapped expectation is bounded by 



E [E* [h2 {Xl,Xl) h2 {Xl,Xl) h2 {Xl,Xl) h2 {Xl,Xl)]]\ 
1 " 

< — 8 ^ \E [h2 {Xi-^, Xi^) h2 {Xi^, Xi^) h2 {Xir,, Xif^) h2 {Xi^, Xig)]\ . 

Jl,...,J8 = l 
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Let now lie ii and 12 in the same block and the other indices in different blocks. Then 
add up the expectations for all ^2 in the same block as ii: 



1 " 



«2 



n,...,jg=l 

Treating the other cases in the same way to obtain by Lemma [ST 



E 

< 



K 



(hiy {bl - 1) 



I E [/l2 {Xi^ , ) /l2 (Xi^ , ) /l2 {Xi^ , Xjg ) /l2 {Xi^ , Xig ^ 

ii,...,is=l 



□ 



4.2 U-Statistic CLT 



Proof of Theorem \1.8l - Under the absolute regularity condition, this is Theorem 1 of 
Yoshihara [27]. Under the strong mixing condition, we use the Hoeffding-decomposition: 



niUn{h)-e) 



2 

— V/li {Xi) + V^Un 



{h2 



The first summand has a normal limit with variance 4(7^ by Theorem 1.7 of Ibragimov 
|17j . The second summand converges in probability to zero because of Lemma [321 The 
theorem follows with the Lemma of Slutzky. □ 

4.3 Bootstrapping U-Statistics 

Proof of Theorem \2.1[ Use the Hoeffding-decomposition 



By Theorem 2.3 of Shao, Yu 



1=1 



Var 



i=l 



Var 



'n 



Y^hi {Xi 



i=l 
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5 Acknowledgment 



By the Lemmas 13.41 or 13.61 and 13.71 

Var [^Un ih2)] 
VbiU* (/12) 



Var* 



r 



(15) 
(16) 



This together proves hne jj]). To prove hne jS]), note that for every subsequence of 



Var* 



VblU* {h2] 



there exists another almost sure convergent subsequence (nfc)^gp^, 



and by the Lemma of Slutzky 
P 



sup 



2 



-P* 



{hux:)-E*[h,ixt)])< 



From Lemma [3^ or \3M and the Lemma of Slutzky follows: 



0. 



sup 



P[V^{Un{h) 



<x\ -P 



X 



With Theorem 2.4 of Shao, Yu [25] and the triangle inequality, ^ holds for the subse- 
quence (nfc)^gp^ almost surely. Since the subsequence is arbitrary, ([3]) holds in probabil- 
ity. □ 

Proof of Theorem \KM We get from Lemma 13.81 and the Chebyshev inequality 



Var* 



VFiu* (/i2 



> e 



< [n'U^ {h2)] = O 



n 



-1-'/ 



As these probabilities are summable, the convergence in line p6]) holds almost surely 
under this conditions. □ 

Proof of CoroUary \2.3l - By Theorem 3.3 of Lahiri [19], the rate of convergence follows 
for the variance of -^^Ylf=i^^ i-^i)- The faster convergence to zero of (6/)^ f/*'^ (/i2) 
(Lemma 13. 8p completes the proof. □ 
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